Neural Networks as a Composition Diagnostic 
for Ultra-high Energy Cosmic Rays 



Andre K. 0. Tiba* , Gustavo A. Medina- Tanco 

l/"") ' Instituto de Astrondmia, Geofisica e Ciencias Atmosfericas 

Universidade de Sao Paulo 
^ ! Caixa Postal 9638 - CEP 05508-900, Stio Paulo - SP, Brasil 

! Sergio J. Sciutto 

Departamento de Fisica and IFLP/CONICET, Universidad Nacional de La Plata, 

C. C. 67, 1900 La Plata, Argentina 

> ' 

in ; 

in . 

CN ■ Abstract 

(N ; 

■ We analyze here the possibility of studying mass composition in the Auger data 

. sample using neural networks as a diagnostic tool. Extensive air showers were sim- 

ulated using the AIRES code, for the two hadronic interaction models in current 
use: QGSJet and Sibyll. Both, photon and hadron primaries were simulated and 
used to generate events. The output parameters from the ground array were sim- 
ulated for the typical instrumental and environmental conditions at the Malargiie 
Auger site using the code SAMPLE. Besides photons, hydrogen, helium, carbon, 
oxygen, magnesium, silicon, calcium and iron nuclei were also simulated. We show 
that Principal Components Analysis alone is enough to separate individual photon 
from hadron events, but the same technique cannot be applied to the classification 
of hadronic events. The latter requires the use of a more robust diagnostic. We show 
that neural networks are potentially powerful enough to discriminate proton from 
iron events almost on an event-by-event basis. However, in the case of a more real- 
istic multi-component mixture of primary nuclei, only a statistical estimate of the 
average mass can be reliably obtained. Although hybrid events are not explicitly 
simulated, we show that, whenever hybrid information in the form of X max is intro- 
duced in the training procedure of the neural networks, a considerable improvement 
can be achieved in mass discrimination analysis. 
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1 Introduction 



The origin of ultra-high energy cosmic rays (UHECRs) is currently one of the 
most exciting problems in modern astrophysics. Up to now, no astrophysi- 
cal object is known that could accelerate charged particles to such energies. 
Besides, if the sources are located on cosmological distances, then it would 
be expected that the Cosmic Rays arriving to the Earth would loose energy 
after interacting with the cosmic microwave background above an energy of 
about 6 x 10 19 eV [1], the threshold for photo-pion production in interactions 
with the cosmic microwave background radiation (CMBR). This energy would 
therefore mark a sharp bend in the Cosmic Rays energy spectrum. Is is not 
clear at present whether such bent is seen by the experiments operating so far. 
If the sources are nearby, then an anisotropic distribution of arrival directions 
is expected because in this case the directions of arrival would point to the 
sources, and this does not seem to be the case either. 

Alternative explanations of the existence of the UHECRs have been developed 
over the last few years. Indeed, a whole spectrum of models exists, ranging 
from the more conservative bottom-up models, in which nuclei are accelerated 
up to the highest observed energies [2] , to the more radical top-down scenarios 
where either new particles or exotic phenomena are invoked to produce the 
particles at already higher energies than those measured ([3] for example). 

The discrimination among these models, requires an accurate determination 
of the energy spectrum, the distribution of arrival directions and the identity 
of the particles. 

Additionally, accuracy must be complemented with high statistics. During the 
next few years, it is expected that at least two major experiments will attain 
this requirements: Auger South and North [4] and the Extreme Universe Space 
Observatory, EUSO [5]. 

In this work will concentrate on the specific characteristics of the Auger ex- 
periment. Nevertheless, the following analysis might be conceptually extended 
to any other air-shower-based ultra-high energy cosmic ray experiment. 

The Auger experiment consists of two detectors of about 3000 km 2 each, lo- 
cated at sites in the Southern and Northern hemispheres respectively. The 
Southern observatory is being deployed at present in Malargiie, Argentina, 
and is already taking data as construction proceeds. Each detector will be 
capable of measuring the properties of the showers generated by the ultra 
high energy cosmic rays. An array of surface detectors (SD) will measure the 
lateral distribution of the shower at ground level, while fluorescence detec- 
tors will measure the longitudinal distribution of the shower traversing the 
atmosphere. 
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The development of extensive air showers (EAS), as characterized by their lat- 
eral distribution, curvature of the shock front, rising time, pulse shape, total 
number of photoelectrons, etc., carries information regarding the direction, 
energy and identity of the incoming primary. However, while direction and 
energy can be estimated rather easily from ground array data (e.g. [6]), the 
definition of a convenient and efficient diagnostic for primary identity discrim- 
ination remains a challenging issue. 

In particular, besides some punctual indications against UHE photons as pri- 
maries [7,8,9], only one comprehensive study limiting the photon flux above 
10 19 eV has been published [10] up to now, and it is based on measuring the 
ratio of vertical and inclined showers at Haverah Park (zenith angles > 60°). 
The separation between light (protons) and heavier (Fe nuclei) hadrons is still 
much more difficult. 

Given the present uncertainties, results so far remain mostly qualitative and 
it is likely that such a complex problem will not be solved by the use of a 
single technique. 

In this paper we present the results of an ongoing effort to develop primary 
identification diagnostics with the aid of multivariate and neural network tech- 
niques. A pragmatic approach is taken to the practical problem of statistically 
determining the identity of the primaries starting EAS at the top of the atmo- 
sphere with the ground array of the Auger Observatory as the specific target. 

The paper is structured as follows. Section two describes the data set, built 
from simulations of extensive air showers and their surface detection. In section 
three we show and apply the Principal Component Analysis (PCA) method to 
separate photons and hadrons. In section four after a summary introduction 
to Neural Networks (NNs) we apply them to the classification of proton and 
iron nuclei (on an event-by-event basis) and to the problem of classification 
in the case of a continuous mass spectrum which is approximated by an eight 
nuclei cosmic ray flux. Section five summarizes our results and conclusions. 



2 EAS simulations and detection 

A large sample of showers for primary photons and hadrons was generated 
with the AIRES code and, transformed into ground array events of a model 
Auger Observatory, simulated with the SAMPLE code [6]. 

The AIRES system is one of the widely used codes for EAS simulation cur- 
rently in use. All the relevant particles and interactions are taken into ac- 
count during the simulations, and a number of observables are measured and 
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recorded, among them, the longitudinal and lateral profiles of the showers, the 
arrival time distributions, and detailed lists of particles reaching ground that 
can be further processed by detector simulation programs. The AIRES system 
is explained in detail elsewhere [11]. 

The showers processed in this work consist of: (i) a series of 1831 proton, 
gamma, and iron showers, processed with AIRES 2.4.0 and QGSJet 98 hadronic 
interaction model, with energies in the range 10 17 - 5 eV to 10 20 - 5 eV, and zenith 
angles in the range to 60 degrees; (ii) a series of 10,000 proton, He, C, O, 
Mg, Si, Ca and Fe showers, with energies in the range 10 17 - 5 eV to 10 20 ' 9 eV 
and zenith angles in the range to 84 degrees, processed with AIRES 2.5 and 
both QGSJet 01 [12] and Sibyll 2.1 [13]. Each shower was reused 20 times at 
different location in the array, and so the final number of available events is 
36,620 for set (i), and 200,000 for set (ii). The surface detectors have been 
simulated using the SAMPLE SD simulation program. 

The directly observable outputs for each event, which include the number and 
spatial distribution of triggered tanks and the time profile of the signal at 
each station (the fluency times T w , T 50 and T 90 ), together with elementary 
reconstructed quantities (e.g., primary energy E Q , zenith angle 9) are used 
to define different sets of parameters. The energy and zenith angle were not 
reconstructed, but the input parameters to shower simulations were directly 
used. 

The number of ground events (measured with only the surface detector ar- 
ray) are roughly ten times the number of hybrid events (measured with both 
techniques, surface and florescence). Therefore, it is important to evaluate 
the improvement on classification efficiency obtained from the use of hybrid 
events. A good idea on that respect can be obtained, without resorting to full 
simulation of the longitudinal development reconstruction, by simply adding 
the X max parameter to the set of ground array parameters. Hence, whenever 
we want to assess the potential of hybrid events, we include the X rnax value 
calculated by AIRES without performing the fluorescence reconstruction of 
the showers. 



3 Principal Component Analysis: photon-hadron separation 

The full set of directly measured and reconstructed quantities, can be com- 
bined to form an n-dimensional orthogonal parameter space. This space can 
be further studied by using Principal Component Analysis (PCA), in search 
for primary separation. 

The PCA method (for example, [14]) simply performs a rotation in the n- 
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dimensional space to a new orthogonal coordinate system whose unit vectors 
are the eigenvectors of the system. These new axis have a special meaning, 
since their associated eigenvalues are a measure of the dispersion of the data 
along each axis. Thus, the principal eigenvector has the largest associated 
eigenvalue, and therefore the largest dispersion, or information content, of the 
sample; the second eigenvector has the second largest dispersion and so on. 
Typically, one can quantify the amount of information associated with a subset 
of axis, and can even expect to uncover the true dimensionality of the system 
if this has been overestimated. 



One advantage of the PCA method is that, involving only rotations, the new 
axis are only linear combinations of the original magnitudes. 

As an illustrative example [15], lets take a parameter space defined arbitrarily 
by: 



A (sort of) curvature estimator, 

(To t ext) ~ (Tojnt) 



Pi 



(r e xt) ~ {Tint) 



x sin# 



(1) 



where the subscripts "ext" and "int" refer to stations that are farther away and 
nearer the shower axis than the median distance r c of the triggered stations, 
and r ext and r int are the average distances inside each region. 



The third largest total number of vertical equivalent muons, 

P 2 = [{vem) total ] Zrd 



(2) 



The pulse shape/rising time (average), 
P = ( Tso ) 

1 10 + 1 50 



(3) 



where Tj are the fluency times for 10% and 50% of the total fluency at a given 
station. 



The pulse shape/rising time (3rd largest value), 
Pa = (Tio + T 50 + T 90 ) 3rd 



(4) 



A lateral distribution estimator, 



(vem) total 



5th 



(vem tota i 



(5) 



3rd 
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The 3rd largest value of the rising time, 

Pe = (^#) (6) 

\ 1 90 — 1 / 3rd 



plus the median of the station distances to the axis of the shower Pj = r c , the 
primary energy P$ = Eq, the zenith angle Pg = 9 and the number of triggered 
stations P w = N stat . 

All these parameters are normalized later so that their dynamical ranges are 
in the interval (—1,1). 

When a PCA analysis is performed in this parameter space for the simulation 
set (i) of proton, iron and gamma primaries, it is found that the first 4 eigen- 
vectors are responsible for ~ 80% of the variance (or information content) of 
the system. The 7th eigenvector is responsible for only ~ 6 % of the variance. 



In this particular parameter space, the best separation between nuclei and 
photons is obtained for the projection onto the plane defined by the first and 
seventh eigenvectors (see Fig. 1). The thick line, EVj = —48.89 x (EV\ + 
0.007) 2 + 0.011, in Fig. 1 is able to separate very well nuclei from gamma 
populations: the probabilities of misidentification are ~ 3% and ~ 4% for 
photons and nuclei, respectively. 
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Fig. 1. PCA results on the illustrative parameter space. The best separation between 
nuclei and photons is obtained for the projection on the plane defined by the first 
and seventh eigenvectors. The thick line misclassifies 4% of the nuclei as photons 
and 3% of the photons as nuclei. 



Once the photons have been separated, the same process could be applied, in 
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principle, to the further separation of nuclei among themselves. However, this 
is a much more complicated problem as can be seen in Fig. 1 or even in Fig. 2, 
which show that proton and iron nuclei are truly mixed in these parameter 
space regardless of what projection is chosen for the analysis. The necessity 
of a more powerful technique is obvious and, as we will show in the following 
section, a potentially useful diagnostic tool can be obtained by resorting to 
neural network simulations. 




P 4 (~ pulse shape) 



Fig. 2. Projection onto the P\—P± plane of the sample points, once the photon events 
have been extracted, showing the difficulty involved in the separation of light and 
heavy nuclei. 



4 Neural Networks applied to nuclei separation 



An alternative approach for hadronic primary separation can be obtained by 
applying neural network technics to the problem. 



4-1 Neural Network design 



A neural network (NN), is structured in parallel layers of neurons, connected 
to neurons in adjacent layers. Each connection has a statistic weight (or just 
weight) since two neurons of the same layer, should not "see" the same input 
coming from a neuron of the previous layer. 

In general terms, a NN is formed by: the input layer, which is connected to 
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the input data vector; an indefinite number of hidden layers and the output 
layer, the last layer of neurons (see Fig. 3). 




Input vector Input layer 



Hidden layers 



Output layer Output 



Fig. 3. A general network, with an input layer, an hidden layer (multi-layer) and an 
output layer. Each square box represents a neuron unit showed in Fig. 4. 

The elementary processing unit in a NN is the artificial neuron. The informa- 
tion arrives to the neuron from many input channels. The information coming 
from each channel is linearly transformed by applying a multiplicative weight 
and an additive bias and fed to a transfer function which gives the neuron 
output signal (see Fig. 4). The bias parameter b regulates the threshold ac- 
tivation of the neuron (for boolean neurons). But in our case, the bias is one 
more parameter to be adjusted by the training algorithm. 



Fig. 4. A single neuron from Fig. 3: the input vector p, the weights wi (i = l,...,R), 
the bias b, the transfer function / and the output signal a. The weight Wi shows 
which importance each input signal has for the neuron. 

The training algorithms that we tested, required neurons with differentiable 
transfer functions. After tests with several transfer functions, we selected ba- 
sically two types: (a) linear: 



inputs 
Pi. 



output 




f(n) = n 



(7) 



and (b) hyperbolic tangent sigmoid: 



where: 



R 



n = 



+ b 



(9) 



i=l 



In our case, the last layer of all networks has one neuron with linear transfer 
function. All other neurons of the network have hyperbolic tangent sigmoid 
as transfer function. 

For simplicity, we adopt hereafter the following notation in order to represent 
a network. The notation 7i-10-10-10-lo, for example, means: one input data 
vector of dimension 7; three layers, each one with 10 neurons (all hyperbolic 
tangent sigmoid), and the output layer with one neuron(linear). This network 
has 311 free parameters: 280 weights and 31 biases. Networks of very different 
architectures and sizes were tested, running from approximately 80 to more 
than 1100 weights. 

The adopted solution was a supervised training algorithm. The training data 
set $, is a collection of Q events, $ = {{p q , y q }} q =i where p q is the input 
data vector (the measured parameters that characterize the event) and y q is 
the identity of the event (atomic mass of the primary nucleus). The super- 
vised training minimizes the difference between the desired output y q and the 
computed output a qi adjusting the weights and biases of the NN. 

The minimization function is generally the square error function 

Q 



where x is the vector formed by the weights and biases of the NN. 

The basic backpropagation algorithm [16] was developed for networks with 
multiple layers of neurons. This is a feedforward algorithm, whose main char- 
acteristic is that the process of weight adjusting progresses from the last to 
the first neuron layer. Some of the backpropagation algorithms we tested use 
equation (10) as their minimizing function. 

A frequent problem in network training is the lost of generalization capability. 
The algorithm attains a good training level (i.e., low values of -F(x)) but it 
is not capable of reproducing the same good performance when faced to an 
independent test set. This problem can often be traced to the development 
of some few large weights during the process of minimization of the error 
function. These large weights are responsible, in turn, for a very sensitive NN, 
which can react unpredictably when presented with inputs that depart, even 
slightly, from the original training set. 



F(x) = EMx)-a,) 2 



(10) 
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An attempt to solve this problem, used in the present work, is the Bayesian 
Regularization Backpropagation Algorithm (BRBA), described by [17] and 
detailed in [18,19]. Its minimization function is a modified version of (10), 
aimed at improving the network generalization capability: 

F(x(k)) = a(k) f>, ~ <k)f + (3(k) jr x (kf 

q=l i=l 

= a{k)E er {k) + (5{k)E x {k) (11) 

where Q is the number of weights of the NN, a and (5 are parameters that 
change at each iteration k. 

Roughly speaking, the term E er is responsible for the network learning, while 
E w is correlated with its generalization capability because it keeps checked 
the value of sum of the square of the weights. By minimizing both terms 
simultaneously, E er and E w , one expects an equilibrium between learning and 
generalization capability. 

In all cases after training our networks with a given training data set, their 
generalization capability was checked with an equivalent independent control 
set. 



4-2 Two components UHECR flux: p-Fe separation 

Given a specific problem, there are no rules to design on optimal NN. Each 
problem is a case study in itself. Therefore, many different NN architectures 
were tested. Several backpropagation algorithms, number of layers, number of 
neurons in each layer, transfer functions, number of iterations and input sets 
of parameters were tried until acceptable results were attained [20]. 

The best results were obtained by first transforming the parameter space into 
its eigenvector space employing PCA techniques (reducing at the same time 
the dimensionality of the input data). As a first example, in Fig. 5, we show 
the results for a feedforward network 7i-10-10-10-lo. 

The input parameters used, Pi, based in direct observables and reconstructed 
magnitudes from the surface array detector: 

\N stat J ^ 1 >l \ 1000m; 

10 



(12) 



p 2 



y T (J^L-\ (13) 



Nstat) ti w viooomy 



E T m, (14) 



p 4 = 



) 1 * stat 
E ^.i, (15) 



Nstat y i = \ 



Nsta 



E T 90„ (16) 



N s tat / j = x 



plus initial energy (Pg = -^o), zenith angle (P7 = #) and number of triggered 
stations (P 8 = N stat ); where, for the station i: (vem)i is the signal measured 
in vertical equivalent muons, T sp ^ is the arrival time of the shower plane, r i 
is the distance to the shower axis and Tj yi ( j = 1 , 50, 90) are the rising times, 
at which j% of the integral value of the signal is attained. 

A set of cuts was applied to remove events too far from the average behavior of 
the parameter distributions. The cuts, described in Table 1, eliminate roughly 
1/4 of the events. 

Table 1 

The cuts eliminate roughly 1/4 of the events. The remaining 3/4 events were ran- 
domly split into training and independent control sets. 



Cuts 



40 < Pi < 3 x 10 5 
7 x 10~ 3 < P 2 < 20 
3 x 10~ 2 < P 3 < 80 
3 < P 5 < 250 

P 8 = Nstat > 4 



The network was trained to output desired for protons and 1 for iron nuclei. 
The training set had 10000 events (5000 for each kind of nucleus). 

Fig. 5 shows the result of applying the trained network to independent control 
samples of 13000 events each one. 

If we assume that an output < 0.5 identifies the event as proton generated and 
an output> 0.5 signals to an iron nucleus, then the misclassification probabil- 
ities for proton and iron nuclei are, respectively, P p ~ 27% and Pp e ~ 16%. 
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Fig. 5. Results of the application of a trained feedforward network to an independent 
control sample of 13000 events triggered by protons and iron nuclei. The network was 
trained to output a value of zero (one) for a proton(iron) primary. Tails, therefore, 
correspond to misclassified events. Only surface array information was included. 

In order to asses the impact of using information coming from hybrid events, 
we performed an additional run including also X max . The corresponding out- 
put is showed in Fig. 6. A noticeable improvement shows up clearly as the 
misclassification probability goes down to ~ 12% for protons and ~ 12% for 
iron. Furthermore, the number of ambiguous events with intermediate results 
between and 1 diminishes noticeably producing a cleaner output. 



8000 



6000 



= 4000 



"2000 



4000 



proton 



0.5 1 

Network output 




0.5 1 

Network output 



Fig. 6. Same as Fig. 5, but now hybrid events were considered (basically through 
the inclusion of X max ). A much clearer separation is obtained, despite some events 
are still misclassified. 

Since each shower was reused 20 times to simulate ground events, those events 
may not be completely independents themselves. This could result in some 
artificial improvement of the network response. To check this possibility, we 
eliminated the shower repetitions, lowering the number of available events by 
a factor of 20, but guaranteing the independence of the events. 

We use the same network architecture as before, but this time only one cut 
was applied, P 8 = N stat > 4, since additional cuts would eliminate too many 
events hindering the analysis. The training set had 900 events (450 of each 
kind of nucleus) and the independent control samples had roughly 480 events. 



The misclassification probabilities for proton and iron nuclei were, respectively, 
P p ~ 26% and Pp e ~ 20% for surface events. The same misclassification 
probabilities, for hybrid events, were P p ~ 13% and Pi? e ~ 13% for proton and 
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iron nuclei. Therefore, there was no artificial improvement in network results 
because of the not fully independence of the events in the former sample. 
Note also that this result holds even when the second, completely independent 
sample, had less cuts in parameter space, which should worsen the results. 
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Fig. 7. The proton(iron) control samples simulated by the network are in left(right) 
histograms. The top(down) four histograms were simulated with Sibyll(QGSJet) 
events. The networks trained with Sibyll events had thinner and clearer output 
distributions. 

In order to maximize the number of events available, training and control data 
sets of mixed events, of QGSJet and Sibyll interaction models, were used in the 
previous examples, which hampers the performance of the networks. From now 
on we show networks simulated with either QGSJet or Sibyll events. Special 
care was put in selecting equivalent samples of QGSJet and Sibyll events with 
the same distribution in initial energy E and zenith angle. 

The training set had 6000 events (3000 of each nucleus) and the control in- 
dependent samples had 2500 events of each nuclei. The same network archi- 
tecture 7i-10-10-10-lo and the same procedures used before were applied. The 
results of simulating QGSJet and Sibyll events separately are shown in Fig. 7. 

The four upper histograms are the outputs of networks simulated with Sibyll 
control samples while the four lower histograms correspond to networks sim- 
ulated with QGSJet control samples. Results are presented for surface and 
hybrid events. The networks simulated with Sibyll events shows thinner pics 
around the desired outputs (0 for proton and 1 for iron) and their distributions 
have a smaller number of ambiguous events than in the case of QGSJet. 



The misclassification probabilities of proton and iron nuclei, with surface and 
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hybrid events are in Table 2. 
Table 2 

The misclassification probabilities of proton(P p ) and iron (i-*Fe) nuclei for networks 
simulated with control independent samples of different hadronic models. 



Hadronic model 


Detection 


Pp(%) 


P Fe (%) 


Sibyll 


Hybrid 


1 


1 


Sibyll 


Surface 


12 


10 


QGSJet 


Hybrid 


8 


9 


QGSJet 


Surface 


15 


14 



As expected, the results for the misclassification probabilities are much better 
than in the previous examples, where mixed hadronic interaction models were 
used. 

Table 2 shows that the improvement in misclassification probabilities when go- 
ing from surface to hybrid events is, for Sibyll events, one order of magnitude. 
For QGSJet the improving is roughly a factor of two. 

In the case of pure surface events, QGSJet and Sibyll misclassification proba- 
bilities are approximately equivalent, with a slight advantage in favor of Sibyll. 
Nevertheless, in the case of hybrid events, those probabilities are very differ- 
ent, being a factor of 10 smaller for Sibyll than for QGSJet. The QGSJet 
hadronic model has a higher multiplicity than Sibyll, which produces a larger 
X max dispersion (0"x maa .) and more fluctuations [21]. Apparently, the networks 
had more difficult in training and recognition of these noisier data sets. 

In order to study the effect of our lack of knowledge about the true hadronic 
interaction model operating in nature at these energies, we trained networks 
with events simulated with one hadronic model and tested them afterwards 
with events simulated with the other hadronic interaction model. The same 
network architecture was used as before. The corresponding misclassification 
probabilities of control samples are shown in Table 3. 

Table 3 shows the important result, that NN should be able to classify primary 
identity correctly, at least in 70 % of the events, independently of the hadronic 
interaction model used in the simulations. 

Furthermore, if the true hadronic interaction model is unknown, the safest bet 
is to train with QGSJet and, when possible, to limit the analysis to hybrid 
events. 

The tests performed on binary mixture of proton and iron nuclei show that 
NN are capable of acceptable classification almost on an event by event bases. 
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Table 3 

The same as in Table 2, but different hadronic interaction models were used for 
training and testing. 



Hadronic model 

Trained Tested Detection P p (%) P Fe (%) 

Sibyll QGSJet Hybrid 18 15 

Sibyll QGSJet Surface 27 21 

QGSJet Sibyll Hybrid 6 10 

QGSJet Sibyll Surface 21 17 



However, a binary model for the flux of UHECR, formed only by proton and 
iron nuclei, is not realistic. If heavy nuclei, say iron, arrive at the top of 
atmosphere, lighter nuclei should arrive too, even if only due to photodisin- 
tegration via interactions with the infrared background in the intergalactic 
medium. Therefore, we expanded the same NN analysis to a group of eight 
different nuclei. 



4-3 Multi- component UHECR flux: 8 different nuclei 



A flat mass spectrum, composed by a mixture of 8 nuclei, was used for training: 
X H, 4 He, 12 C, 16 0, 24 Mg, 28 Si, 40 Ca and 56 Fe. 

As in the previous case, we tested different NN architectures, learning and 
training algorithms and sets of parameters. This time, unfortunately, we were 
unable to design a NN capable of adequately separating individual events. 

Consequently, we limited our attempts to a statistical characterization of the 
mass spectrum. 

We were forced to apply additional cuts to the sample of events, based on 
a series of curve fittings to the lateral distribution of the individual showers. 
The curve fittings characterize the shape of the lateral distribution function, 
the curvature of the shower, and the radial dependence of the time structure 
of the shower front: 

/ r \ bPe 

(vem) - {vem) 90 , mm = a pe [j^J ( 17 ) 

<«« = "(ikd)" (18) 
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ho — ho,min ~ a 50 ( i nnn ) (19) 



1000 



HO 



ho — tgo.min — ago ( Yqqq ) 



The cuts were applied on the bases of the value of some of the fitted parameters 
in eq. (17)-(20). The new cuts are given explicitly in Table 4. Again, only high 
flyers were eliminated. Restrictions were also applyed on the linear regression 
of the various fittings, p«. We added an energy cut in energy, similar to Auger 
energy threshold. 

Table 4 

The cuts of Table 1 and Table 4, applied together, eliminate roughly 2/3 of the 
events. 



Cuts 


E > 10 19 eV 


-5 < 6 pe ,j < 


< bo,i < 7 


< b 50 ,i < 3 


< b 90 ,i < 2.5 


Pi > 0.85 



Since event-by-event classification proved too unreliable, we limited ourselves 
to the statistical determination of the average atomic mass of different com- 
position spectra. As an example we used, arbitrarily, simulated composition 
spectra of the form dN/dA oc A u , with v = -0.5, 0.0, 0.5, 1.0 and 2.0. 

For each value of v, 1000 mass spectra of 300 events each, were used to calcu- 
late the distribution function of the estimated average mass. 

The median output of the network for each nucleus was re-scaled to its actual 
atomic mass by fitting an exponential function. The criterion used to select 
among different neural networks was, simultaneously, the goodness of the ex- 
ponential fitting and the dynamical range of the output between proton and 
iron. 

Fig. 8 shows the estimated average masses for different spectral indices v for 
both, hybrid (a) and pure surface events (b). The vertical thick lines are the 
true values of the average spectral mass for each v. 

We used the set of parameters given by equations (12-16) plus energy, zenith 
angle and number of triggered stations. The network 7i-4-4-4-lo presented the 
largest dynamical range between the proton and iron control sample medians 
among all the network architectures tested. The cuts of Table 1 and Table 4 
reduced 2/3 the number of selected events. Therefore, we mixed our QGSJet 
and the Sibyll samples first to keep a higher number of events for training. 
This means that the results presented here are a worst case scenario. 
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Fig. 8. Mass average for 1000 samples of 300 events each simulated with network 
7i-4-4-4-lo, for (a) hybrid events (i.e., with X max ) and (b) pure surface events (i.e., 
without X max ). The vertical thick lines are, the true values of the average spectral 
mass for each v. 
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Fig. 9. The same of Fig. 8. (a) - Sibyll, hybrid detection; (b) - Sibyll, surface de- 
tection; (c) - QGSJet, hybrid detection; (d) - QGSJet, surface detection. Statistical 
parameters from these distributions are on Table 5. 

The errors in the average value of the distributions is of the order of 7% for 
hybrid (Fig. 8. a) and 12% for surface (Fig. 8.b) events (see Table 5). 

The same analysis, but discriminating by hadronic interaction model is pre- 
sented in Fig. 9. The training set had 12,000 events (1,500 for each nucleus) 
and control samples had 2,000 events for each nucleus. The upper panels show 
the networks trained and simulated with Sibyll events, while the lower panels 
show the same but for QGSJet events. 

As in the case of the proton-iron mixture, the networks were able to classify 
better when trained with Sibyll events than with QGSJet. 
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Table 5 

Statistical parameters of distributions from Fig. 8 and Fig. 9. (At) is the true av- 
erage distribution (vertical thick lines), \fa* is the standard deviation error, is 
standart mean error. 



Hybrid detec. Surface detec. 



H. model 


V 


(At) 


V^ 1 


®(%) 


v 7 ^ 


7®(%) 


QGSJet 


-0.5 


21.45 


1.59 


7 


2.79 


13 


+ 


+0.0 


28.50 


1.90 


7 


3.45 


12 


Sibyll 


+0.5 


33.68 


2.06 


6 


3.96 


12 




+1.0 


37.35 


2.28 


6 


4.29 


11 




+2.0 


42.00 


2.42 


6 


4.82 


11 


Sibyll 


-0.5 


21.45 


0.81 


4 


1.32 


6 




+0.0 


28.50 


0.99 


3 


1.61 


6 




+0.5 


33.68 


1.14 


3 


1.84 


5 




+1.0 


37.35 


1.13 


3 


1.92 


5 




+2.0 


42.00 


1.16 


3 


2.10 


5 


QGSJet 


-0.5 


21.45 


1.74 


8 


2.20 


10 




+0.0 


28.50 


1.99 


7 


2.64 


10 




+0.5 


33.68 


2.28 


7 


3.29 


10 




+1.0 


37.35 


2.43 


7 


3.40 


9 




+2.0 


42.00 


2.55 


6 


3.61 


9 



The errors on the determination of the average mass of the distributions 
are given in Table 5. For Sibyll(QGSJet) events, they were of the order of 
6%(10%), for surface detection, and can go down by as much as a factor of 
two for hybrid events. 

With the aim of testing the effect of our lack of knowledge of the true hadronic 
interaction model, we trained NN with events of one interaction model and 
simulated them with events of the other interaction model. The tested net- 
works were the same as in the previous example. 

The networks trained with Sibyll events and tested with QGSJet events pre- 
sented smaller standard deviation and standard mean error than those trained 
with QGSJet and simulated with Sibyll (see, Table 6). 
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In the worst possible scenario, in which we have only surface information and 
employ the wrong hadronic interaction model, our neural networks can still 
calculate the average of the mass spectrum with an error inferior to 20%. 

These results show, surprisingly, that NN are only weakly dependent of the 
hadronic interaction model used. 

Table 6 

The same of Table 5, but now the networks were trained and tested with events of 
different hadronic models. 



Had. model 






Hybrid det. 


Surface det. 


Trained Tested 


V 


(At) 










Sibyll QGSJet 


-0.5 


21.45 


2.43 


n 


3.58 


17 




+0.0 


28.50 


2.88 


10 


3.93 


14 




+0.5 


33.68 


3.02 


9 


4.21 


13 




+ 1.0 


37.35 


3.36 


9 


4.25 


11 




+2.0 


42.00 


3.38 


8 


4.38 


10 


QGSJet Sibyll 


-0.5 


21.45 


3.43 


16 


4.57 


21 




+0.0 


28.50 


4.98 


17 


5.18 


18 




+0.5 


33.68 


5.94 


18 


5.90 


18 




+ 1.0 


37.35 


6.43 


17 


6.48 


17 




+2.0 


42.00 


6.91 


16 


6.72 


16 



5 Conclusion 

In the present work we showed the potential of both PCA and Neural Networks 
applied to mass composition analysis of UHECRs. 

In particular, we studied mass discrimination in three kinds of cosmic ray mix- 
tures: (a) photon-nuclei, (b) proton-iron and (c) a mass spectrum comprising 
eight nuclei running from proton to iron. 

In cases (a) and (b) we showed that event by event classification was possible 
while, in a multi-component mixture, only a statistical characterization of the 
mass spectrum is possible. 

The PCA method was applied to the problem of separating photons from 
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hadrons. In this way, and for hybrid events, we were able to attain a proba- 
bility of misidentification of ~ 3% for photons ~ 4% for hadrons (p and Fe). 
Nevertheless, we were unable to separate nuclei among themselves by using 
PCA alone. Neural networks were used to attack this problem. 

In the case of binary mixtures of proton and iron, we were able to design neural 
networks whose misclassification probabilities are 12%(1%) and 10%(1%) for 
proton and iron with surface (hybrid) detection, for Sibyll training and testing. 
For QGSJet events, on the other hand, the misclassification probabilities are 
15%(8%) and 14%(9%) for proton and iron with surface(hybrid) detection. 
The QGSJet hadronic model generates noisier shower than Sibyll because of 
its higher multiplicity. Apparently, the networks had more difficulty in train 
and testing the noisier QGSJet event sets. 

Since the true hadronic interaction model at UHECR energies is still very 
much under debate, we also trained networks with events of one hadronic 
model and simulated them with an independent control sample made up of 
events of the other hadronic model. The tests showed that the networks are 
able to classify correctly, at least 70% of the events (iron or proton, surface 
or hybrid detection), independently of the assumptions made on the hadronic 
interaction model. 



For a mixture of eight nuclei, we were unable to design a network capable of 
discrimination on an event-by-event bases. Therefore, we attempted only the 
statistical determination of the average atomic mass of different mass spectra. 
As an example, we show results here for composition spectra of the form 
dA/dN oc A v , with v = -0.5, 0.0, 0.5, 1.0 and 2.0. 

For networks trained with a mixture of QGSJet and Sibyll events, the er- 
rors in the average of mass spectrum distributions were of order of 12% (7%) 
for surface (hybrid) detection. When separating the events by hadronic model, 
the errors in the determination of the average mass of the spectra went down 
to ~ 6%(3%) for Sibyll events, and ~ 10%(7%) for QGSJet events, for sur- 
face(hybrid) detection. 

A weak dependence with hadronic interaction model was confirmed again for 
the NN, by training and testing the same network with different hadronic 
interaction models. The errors incurred in the determination of the average 
spectral mass, in this case, were less than 20%, for surface or hybrid detection 
regardless of the assumed model. 
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